Disclaimer
Any information provided by me on in this document is my opinion and is for informational purposes only. It does not necessarily reflect the views of my employer. All efforts have been made to report true and accurate information. However, the information could become materially inaccurate warning. Any opinions expressed herein are subject to change. None of this information is intended to be used as a primary basis of an investment decision, nor should it be construed as advice or a recommendation.
Robinhood is a fast growing FINRA broker dealer company whose commission free trading app has attracted a large number of new retail traders. In May, Robinhood said it had 13 million accounts, which was up from 10 million at the end of 2019 (Source: https://www.nytimes.com/2020/07/08/technology/robinhood-risky-trading.html). Its customers reportedly are relatively inexperienced, yet tend to trade more frequently and aggressively. According to the above New York Times article, “In the first three months of 2020, Robinhood users traded nine times as many shares as E-Trade customers, and 40 times as many shares as Charles Schwab customers, per dollar in the average customer account in the most recent quarter. They also bought and sold 88 times as many risky options contracts as Schwab customers, relative to the average account size...” The app is also said to encourage more trading through techniques such as push notifications.
As a result of increased user growth and the aggressive trading of these users, Robinhood traders have a potential increasing influence on market prices. Robintrack.net is a site that tracks the number of Robinhood accounts that hold a certain security. In the notebook below, I used the data from Robintrack to create a Robinhood Account Growth Factor. It simply calculates the 5-day growth in accounts holding a particular security and ranks each stock in the universe based on this metric. The idea is that there may be a positive feedback loop as part of a popularity mechanism; whereas, once a stock gets popular on the platform, other users see this through the “trending tickers” widget, and they start to pile into the name as well.
The study uses the QTradeableStocksUS universe, which is a U.S. equity universe that filters for liquidity among some other factors. I also filter out closed end funds that have slipped through in the past. See the Quantopian documentation for more detail on the universe filtering criteria.
The results of this analysis do suggest that Robinhood account growth is positively correlated with short term returns over the next week or so. That said, the results may lack robustness as the sample period is very short. The in-sample period spanned about a year (May 5, 2018 to May 31, 2019 ), while the out-of-sample period was just under a year (July 1, 2019 to May 24, 2020). In the in-sample period, I tested 1, 5, 10, and 20-day percent changes in accounts before settling on the 5-day growth factor. It is noteable that the other parameters did have moderately positive results as well.
Performance declined in the out-of-sample period. Unsurprisingly, the returns also became more volatile as overall market volatility picked up substantially. The underperformance during the selloff might suggest that these “popular robinhood” stocks may suffer from an unwinding of positions during volatile times. This exceptional volatility in the out-of-sample period also makes it particularly difficult to disentangle the source of performance decay. It could have been a result of a difficult market regime for the strategy, or it could have been a result of in-sample over-fitting and/or signal arbitrage/decay.
Despite this trade thesis being based on the concept of momentum in popularity, the performance attribution results suggest a portion of the returns are explained by a positive exposure to the short term mean reversion factor. This aligns with anecdotal evidence that these traders have been prone to buy low priced, beaten up names. Robinhood co-founder and co CEO Vladimir Teneve was quoted by CNBC, “We see a lot of buying activity of specifically industries that were impacted by the pandemic.” Investors traded “a lot in airlines, a decent amount of buying in videoconferencing, streaming services, some biopharmacuetical as well,” (https://www.cnbc.com/2020/06/09/robinhood-traders-cash-in-on-the-market-comeback-that-billionaire-investors-missed.html). That said, specific returns not attributable to common factors appear to be the main driver of alpha.
All analysis was done ignoring commissions or market impact. The goal was to get a sense for the strength of the signal of this single factor and not as a completed trading strategy.
from collections import OrderedDict
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import alphalens as al
# Typical imports for use with Pipeline
from quantopian.pipeline import Pipeline, CustomFactor
from quantopian.research import run_pipeline
# New way: Fundamentals.my_field:
from quantopian.pipeline.data import Fundamentals
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.data.user_57bcf8e194cdba3f02000188 import rh_weekdays_only as rh
Shoutout to Joakim Arvidsson as I did borrow a portion of his code in the summarize_null_records function below.
def summarize_null_records(result):
"""Utility Function to Summarize NaN Values"""
null_summary =pd.DataFrame()
null_summary["null_records"]=result.isnull().sum()
null_summary["not_null"]=result.notnull().sum()
null_summary["total"]=result.isnull().sum()+result.notnull().sum()
null_summary["null_records/total"]=null_summary["null_records"]/null_summary["total"]
symbol_count_by_day = result.groupby(level=0).count()
null_count_by_day = result.isnull().groupby(level=0).sum()
pct_null_by_day = null_count_by_day / symbol_count_by_day
display(null_summary)
pct_null_by_day.plot(title="% of Symbols with Null Values by Date");
def get_al_prices(result, periods):
assets = result.index.get_level_values(1).unique()
start_date = result.index.get_level_values(0)[0]
end_date = (result.index.get_level_values(0)[-1] + max(periods) *
pd.tseries.offsets.BDay() + pd.Timedelta(days=5))
pricing = get_pricing(assets, start_date, end_date, fields="open_price")
return pricing
def _make_RHAcctGrowth_factor(lookback_days):
"""Make Custom Factors that calculate the growth in Robinhood
Accounts over the last `lookback_days` days.
Params
------
lookback_days: int
Returns
"""
class RHAcctGrowth(CustomFactor):
inputs = [rh.accounts]
window_length = lookback_days + 1
def compute(self, today, asset_ids, out, accounts):
out[:] = accounts[-1] / accounts[0] - 1
return RHAcctGrowth
def make_RHAcctGrowth_factors(lookback_list=[1,3,5,10,20], rank=False, **factor_kwargs):
"""Construct a bunch of RH_AcctGrowth_factors with given
lookback days. Returns a dictionary containing the factores"""
factors=OrderedDict()
for lookback_days in lookback_list:
label='rh_growth_{}'.format(lookback_days)
if rank:
label += '_rank'
factors[label] = _make_RHAcctGrowth_factor(lookback_days)(**factor_kwargs).rank()
else:
factors[label] = _make_RHAcctGrowth_factor(lookback_days)(**factor_kwargs)
return factors
def make_pipeline(base_universe = QTradableStocksUS()):
closed_end_funds = Fundamentals.share_class_description.latest.startswith('CE')
universe = base_universe & ~closed_end_funds
factors = make_RHAcctGrowth_factors(lookback_list=[5], mask=universe)
factors.update(make_RHAcctGrowth_factors(lookback_list=[5], rank=True, mask=universe))
factors['rh_accounts'] = rh.accounts.latest
return Pipeline(columns=factors, screen=universe)
Sample Dates: June 5, 2018 through May 31, 2019
start_date = pd.datetime(2018,6,5)
end_date = pd.datetime(2019,5,31)
result = run_pipeline(make_pipeline(), start_date, end_date, chunksize=525)
result = result.replace([np.inf, -np.inf], np.nan)
result.head()
summarize_null_records(result)
Null values become less of a problem through time. As total robinhood accounts grow over time so will the breadth of stocks that customers own.
The below chart is actually included in the full alphalens tearsheet, but I wanted to isolate it because part of my thesis is that the account growth factor values themselves will be autocorrolated. In other words, as users see other robinhood users buying a stock, it will encourage them to buy the stock as well.
The code below calculates the factor autocorrelation lagged by 5 days. There does seem to be a significant positive autocorrelation, which supports my hypothesis.
growth_rank = result.rh_growth_5_rank.unstack()[5:]
growth_rank_lagged = result.rh_growth_5_rank.unstack().shift(5)[5:]
auto_corr_results = {}
for date, ranks in growth_rank.iterrows():
auto_corr_results[date] = \
pd.concat([ranks, growth_rank_lagged.loc[date]], axis=1).corr().iloc[0,1]
auto_corr_s = pd.Series(auto_corr_results)
auto_corr_s.plot(title='Factor Rank Autocorrelation(5)')
plt.axhline(auto_corr_s.mean(), color='r', ls='--');
print("Mean Autocorrelation(5) = {:0.3f}".format(auto_corr_s.mean()))
periods = [1,3,5,10]
quantiles=5
prices = get_al_prices(result, periods)
factor_data_rh = \
al.utils.get_clean_factor_and_forward_returns(
result['rh_growth_5_rank'],
prices,
quantiles=quantiles,
periods=periods)
The mean quintile return distribution is monotonically increasing up until the last quintile. This could be noise, but it could also indicate that at the extremes of the distribution, some mean reversion might be taking place as a result of a large number of "uninformed and unsophisticated traders" push price past any reasonable fundamental value.
Something that I did not really dig into was the alpha decay/optimal holding period. This was largely intentional as I was not confident there was enough data in the sample to really determine this. However, it could be an area of future research, particularly to see if mean reversion happens at a certain point.
al.tears.create_full_tear_sheet(factor_data_rh)
July 1, 2019 to May 24, 2020
(Note, I left a month buffer between in-sample and out-of-sample periods to prevent any information leakage.)
For brevity, I have used the summary tearsheet for the out-of-sample period. More info on the out-of-sample period can be seen in the Pyfolio tearsheet below.
start_date = pd.datetime(2019,7,1)
end_date = pd.datetime(2020,5,24)
result_o = run_pipeline(make_pipeline(), start_date, end_date, chunksize=525)
result_o = result_o.replace([np.inf, -np.inf], np.nan)
periods = [1,3,5,10]
quantiles=5
prices_o = get_al_prices(result_o, periods)
factor_data_rh_o = \
al.utils.get_clean_factor_and_forward_returns(
result_o['rh_growth_5_rank'],
prices_o,
quantiles=quantiles,
periods=periods)
Performance did deteriorate out of sample. Alpha declined while information coefficients shrunk toward 0. Noteably, beta increased slightly along with market volatility which is not a desirable characteristic of the factor (i.e. We don't want the market beta to increase when market is going down).
I think the exceptional market volatility in the out-of-sample period makes it particularly difficult to disentangle the drivers of performance deterioration. It could just be a difficult environment for the strategy. It could also be a result of overfitting in the sample period, or it could be decay of the signal from arbitrage.
al.tears.create_summary_tear_sheet(factor_data_rh_o)
The backtest was run on an equal-weighted long-short strategy of the top/bottom 20% of names based on the above factor. It was run at 200% gross leverage (100% on each side), and was rebalanced daily 1 hour after the open. Commissions and slippage were set to 0, while the benchmark was the short term treasury ETF (SHY).
The most significant factor exposure was to the short term reversal factor which contributed positively to performance, while secondarily, exposure to value detracted as value performed poorly over the period.
bt = get_backtest('5edd2818383612473d6eb3a9')
bt.create_full_tear_sheet(hide_positions=True, live_start_date='2019-07-01')